Search CORE

55 research outputs found

Nommage non supervisé des personnes dans les émissions de télévision. Utilisation des noms écrits, des noms prononcés ou des deux ?

Author: Besacier Laurent
Poignant Johann
Quénot Georges
Publication venue: 'Lavoisier'
Publication date: 01/01/2014
Field of study

National audienceL'identiﬁcation de personnes dans les émissions de télévision est un outil précieux pour l'indexation de ce type de vidéos mais l'utilisation de modèles biométriques n'est pas une option viable sans connaissance a priori des personnes présentes dans les vidéos. Les noms prononcés ou écrits peuvent nous fournir une liste de noms hypothèses. Nous proposons une comparaison du potentiel de ces deux modalités (noms prononcés ou écrits) aﬁn d'extraire le nom des personnes parlant et/ou apparaissant. Les noms prononcés proposent un plus grand nombre d'occurrences de citation mais les erreurs de transcription et de détection de ces noms réduisent de moitié le potentiel de cette modalité. Les noms écrits bénéﬁcient d'une amélioration croissante de la qualité des vidéos et sont plus facilement détectés. Par ailleurs, l'afﬁliation aux locuteurs/visages des noms écrits reste plus simple que pour les noms prononcés

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Unsupervised Speaker Identification in TV Broadcast Based on Written Names

Author: Besacier Laurent
Poignant Johann
Quénot Georges
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2015
Field of study

International audienceIdentifying speakers in TV broadcast in an unsuper- vised way (i.e. without biometric models) is a solution for avoiding costly annotations. Existing methods usually use pronounced names, as a source of names, for identifying speech clusters provided by a diarization step but this source is too imprecise for having sufficient confidence. To overcome this issue, another source of names can be used: the names written in a title block in the image track. We first compared these two sources of names on their abilities to provide the name of the speakers in TV broadcast. This study shows that it is more interesting to use written names for their high precision for identifying the current speaker. We also propose two approaches for finding speaker identity based only on names written in the image track. With the "late naming" approach, we propose different propagations of written names onto clusters. Our second proposition, "Early naming", modifies the speaker diarization module (agglomerative clustering) by adding constraints preventing two clusters with different associated written names to be merged together. These methods were tested on the REPERE corpus phase 1, containing 3 hours of annotated videos. Our best "late naming" system reaches an F-measure of 73.1%. "early naming" improves over this result both in terms of identification error rate and of stability of the clustering stopping criterion. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.2% F-measure

Hal - Université Grenoble Alpes

Nommage non-supervisé des personnes dans les émissions de télévision: une revue du potentiel de chaque modalité

Author: Besacier Laurent
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceersons identification in TV broadcast is a valuable tool for indexing these videos. But the use of biometric models is an unsustainable option without a priori knowledge of people present in the videos. The names pronounced or written on the screen can provide us a list of hypotheses names. We propose a comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers and/or faces. The names pro- nounced offer many instance of citation but transcription and detection errors of these names halved the potential of this modality. The names written benefits of the video quality improve- ment and there are easy to find. The affiliation to speakers/faces of names written is simpler than for names pronounced.L'identification de personnes dans les émissions de télévision est un outil précieux pour l'indexation de ce type de vidéos. Mais l'utilisation de modèles biométriques n'est pas une op- tion viable sans connaissance a priori des personnes présentes dans les vidéos. Les noms cités à l'oral ou écrits à l'écran peuvent nous fournir une liste de noms hypothèses. Nous proposons une comparaison du potentiel de ces deux modalités (noms cités ou écrits) afin d'extraire le nom des personnes parlant et/ou apparaissant. Les noms cités à l'oral proposent un plus grand nombre d'occurrences de citation mais les erreurs de transcriptions et de détections de ces noms réduisent de moitié le potentiel de cette modalité. Les noms écrits à l'écran bénéficient d'une amélioration croissante de la qualité des vidéos et sont plus facilement détectés. L'affiliation aux locuteurs/visages des noms écrits reste plus simple que pour les noms cités à l'oral

Hal - Université Grenoble Alpes

Active Selection with Label Propagation for Minimizing Human Effort in Speaker Annotation of TV Shows

Author: Besacier Laurent
Mateusz Budnik
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 11/09/2014
Field of study

International audienceIn this paper an approach minimizing the human involvement in the manual annotation of speakers is presented. At each iter- ation a selection strategy choses the most suitable speech track for manual annotation, which is then associated with all the tracks in the cluster that contains it. The study makes use of a system that propagates the speaker track labels. This is done using a agglomerative clustering with constraints. Several dif- ferent unsupervised active learning selection strategies are eval- uated. Additionally, the presented approach can be used to ef- ficiently generate sets of speech tracks for training biometric models. In this case both the length of the speech track for a given person and its purity are taken into consideration. To evaluate the system the REPERE video corpus was used. Along with the speech tracks extracted from the videos, the op- tical character recognition system was adapted to extract names of potential speakers. This was then used as the 'cold start' for the selection method

Hal - Université Grenoble Alpes

Automatic propagation of manual annotations for multimodal person identification in TV shows

Author: Besacier Laurent
Budnik Mateusz
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/06/2014
Field of study

International audienceIn this paper an approach to human annotation propagation for person identification in the multimodal context is proposed. A system is used, which combines speaker diarization and face clustering to produce multimodal clusters. The whole multimodal clusters are later annotated rather than just single tracks, which is done by propagation. Optical character recogni- tion systems provides initial annotation. Four different strategies, which select candidates for annotation, are tested. The initial results of annotation propagation are promising. With the use of a proper active learning selection strategy the human annotator involvement could be reduced even further

Hal - Université Grenoble Alpes

Towards a better integration of written names for unsupervised speakers identification in videos

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceExisting methods for unsupervised identification of speakers in TV broadcast usually rely on the output of a speaker diariza- tion module and try to name each cluster using names provided by another source of information: we call it "late naming". Hence, written names extracted from title blocks tend to lead to high precision identification, although they cannot correct er- rors made during the clustering step. In this paper, we extend our previous "late naming" ap- proach in two ways: "integrated naming" and "early naming". While "late naming" relies on a speaker diarization module op- timized for speaker diarization, "integrated naming" jointly op- timize speaker diarization and name propagation in terms of identification errors. "Early naming" modifies the speaker di- arization module by adding constraints preventing two clusters with different written names to be merged together. While "integrated naming" yields similar identification per- formance as "late naming" (with better precision), "early nam- ing" improves over this baseline both in terms of identification error rate and stability of the clustering stopping criterion

Hal - Université Grenoble Alpes

Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?

Author: Besacier Laurent
Le Viet Bac
Poignant Johann
Quénot Georges
Rosset Sophie
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audiencePersons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Nommage non-supervisé des personnes dans les émissions de télévision: une revue du potentiel de chaque modalité

Author: Besacier Laurent
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Active Selection with Label Propagation for Minimizing Human Effort in Speaker Annotation of TV Shows

Author: Besacier Laurent
Mateusz Budnik
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 11/09/2014
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server